Take Home Exercise 1

Author

Hao Xian

Published

January 30, 2023

Modified

February 12, 2023

Overview

Setting the Scene

Important

This context is taken from the IS415 Take Home Exercise 1

All rights belong to Dr Kam Tin Seong.

Water is an important resource to mankind. Clean and accessible water is critical to human health. It provides a healthy environment, a sustainable economy, reduces poverty and ensures peace and security. Yet over 40% of the global population does not have access to sufficient clean water. By 2025, 1.8 billion people will be living in countries or regions with absolute water scarcity, according to UN-Water. The lack of water poses a major threat to several sectors, including food security. Agriculture uses about 70% of the world’s accessible freshwater.

Developing countries are most affected by water shortages and poor water quality. Up to 80% of illnesses in the developing world are linked to inadequate water and sanitation. Despite technological advancement, providing clean water to the rural community is still a major development issues in many countries globally, especially countries in the Africa continent.

To address the issue of providing clean and sustainable water supply to the rural community, a global Water Point Data Exchange (WPdx) project has been initiated. The main aim of this initiative is to collect water point related data from rural areas at the water point or small water scheme level and share the data via WPdx Data Repository, a cloud-based data library. What is so special of this project is that data are collected based on WPDx Data Standard.

Objectives

Important

This Objectives are taken from the IS415 Take Home Exercise 1

All rights belong to Dr Kam Tin Seong.

Exploratory Spatial Data Analysis

  • Derive kernel density maps of functional and non-functional water points. Using appropriate tmap functions,

  • Display the kernel density maps on openstreetmap of Osub State, Nigeria.

  • Describe the spatial patterns revealed by the kernel density maps. Highlight the advantage of kernel density map over point map.

Second-Order Spatial Point Patterns Analysis

With reference to the spatial point patterns observed in ESDA:

  • Formulate the null hypothesis and alternative hypothesis and select the confidence level.

  • Perform the test by using appropriate Second order spatial point patterns analysis technique.

  • With reference to the analysis results, draw statistical conclusions.

Spatial Correlation Analysis

In this section, you are required to confirm statistically if the spatial distribution of functional and non-functional water points are independent from each other.

  • Formulate the null hypothesis and alternative hypothesis and select the confidence level.

  • Perform the test by using appropriate Second order spatial point patterns analysis technique.

  • With reference to the analysis results, draw statistical conclusions.

Setup

Packages

  • sf: used for importing, managing, and processing geospatial data

  • tidyverse: for performing data science tasks such as importing, wrangling and visualising data.

  • tmap: used for creating thematic maps, such as choropleth and bubble maps

  • spatstat: used for point pattern analysis

  • raster: reads, writes, manipulates, analyses and models gridded spatial data (i.e. raster-based geographical data)

  • maptools: a set of tools for manipulating geographic data

  • funModeling: contains a set of functions related to exploratory data analysis, data preparation, and model performance

Installing and Loading the Packages

The code chunk below will be used to install and load these packages in RStudio.

pacman::p_load(maptools, sf, raster, spatstat, tmap, tidyverse, funModeling)

This prepares all the tools necessary for us to start or spatial analysis.

Dataset used

2 datasets are used for this excercise

  1. The First Dataset used would be the Level 2 Administrative Boundary which can be found either from Geoboundaries or Humanitarian Data Exchange

  2. Waterpoint Data Repositories is the dataset for the waterpoint

Handling the Geospatial Data

Importing Geospatial Dataframe

Note

Need to double check the CRS as it is depending on the system used by the country.

Since the country we are focusing on is Nigeria. The EPSG code is 26392., and it encompasses the entire area of Nigeria.

We will be using the st_read() function from the sf package to read the data set. More information on st_read() can be found here..

However, as the polygon data is not in the correct format, there will be a need to convert the geometric data to the correct form. st_transform from the sf package is used to so. More information on st_transform() can be found here

geoBoundaries data set

This dataset loads the boundaries of Nigeria from geoBoundaries

geoNGA <- st_read("data/geospatial/",
                  layer = "geoBoundaries-NGA-ADM2") %>%
  st_transform(crs = 26392)
Reading layer `geoBoundaries-NGA-ADM2' from data source 
  `C:\hxchen-2019\birdie\lessons\Take-home\Take-home_ex1\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 774 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 2.668534 ymin: 4.273007 xmax: 14.67882 ymax: 13.89442
Geodetic CRS:  WGS 84

geoNGA contains the following data:

Columns Name Description
shapeName Name of the Level 2 Boundaries
pcode Unique Code
level ADM2 (Indicating this is a Level 2 Boundaries)
shapeID Unique Code of the Shape
shapeGroup NGA (Indicating Nigeria)
shapeType ADM2 (Indicating this is a Level 2 Boundaries)
geometry Polygon Data

NGA Data set (Humanitarian Data Exchange)

Note

The NGA Dataset is essentially the same as geoBoundaries dataset with the exception that the dataset in geoBoundaries is more condense.

NGA <- st_read("data/geospatial/",
               layer = "nga_admbnda_adm2_osgof_20190417") %>%
  st_transform(crs = 26392)
Reading layer `nga_admbnda_adm2_osgof_20190417' from data source 
  `C:\hxchen-2019\birdie\lessons\Take-home\Take-home_ex1\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 774 features and 16 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 2.668534 ymin: 4.273007 xmax: 14.67882 ymax: 13.89442
Geodetic CRS:  WGS 84

NGA contains the following data:

Column Name Description
Shape_Leng Length of the Shape
Shape_Area Area of the Shape
ADM2_EN English Name of ADM2
ADM2_PCODE Unique ID of the ADM2
ADM2_REF A Reference to ADM2_EN
ADM2ALT1EN Alternative English Name
ADM2ALT2EN Alternative English Name
ADM1_EN ADM1 English Name
ADM1_PCODE Unique ID of ADM1
ADM0_EN ADM0 English Name
ADM0_PCODE Unique ID of ADM0
date Date of the boundaries
validOn Valid Date of the Boundaries
validTo End of Valid Date of the Boundaries
SD_EN Senatorial District
SD_PCODE Unique Code of the Senatorial District
geometry Polygon Data
Important

As NGA seems to offer a more richer data set the rest of the analysis will be done on the NGA Data set

Importing Aspatial Data

Loading the dataset from CSV

The next dataset that we will be loading would be the waterpoint dataset. As the dataset is found in the CSV another function read_csv(), which will import the csv as a tibble dataset. Read more about read_csv() from readr here.

Note

As the CSV contain almost 70 variables and more than 10000 observations it would be better to filter the dataset to the country of interest, in this case, Nigeria. Read more about filter() from dplyr here.

wp_nga <- read_csv("data/aspatial/WPdx.csv") %>%
  filter(`#clean_country_name` == "Nigeria")
Warning: One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
  dat <- vroom(...)
  problems(dat)
Rows: 406566 Columns: 70
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (43): #source, #report_date, #status_id, #water_source_clean, #water_sou...
dbl (23): row_id, #lat_deg, #lon_deg, #install_year, #fecal_coliform_value, ...
lgl  (4): #rehab_year, #rehabilitator, is_urban, latest_record

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Converting the water point data into sf point feature.

Despite loading the aspatial dataframe into a tibble data frame, we would need to convert the dataframe into an sf data frame for us to perform Geospatial Analysis.

The column “New Georeferenced Column” contain the spatial data is a well-known text representation of geometry, as the such the fuction st_as_sfc() can be used to convert that into a sfc object. Read more about st_as_sfc() from sf here. We will append the sfc object into a new Column called “Geometry”.

wp_nga$Geometry = st_as_sfc(wp_nga$`New Georeferenced Column`)
wp_nga
# A tibble: 95,008 × 71
   row_id `#source`      #lat_…¹ #lon_…² #repo…³ #stat…⁴ #wate…⁵ #wate…⁶ #wate…⁷
    <dbl> <chr>            <dbl>   <dbl> <chr>   <chr>   <chr>   <chr>   <chr>  
 1 429068 GRID3             7.98    5.12 08/29/… Unknown <NA>    <NA>    Tapsta…
 2 222071 Federal Minis…    6.96    3.60 08/16/… Yes     Boreho… Well    Mechan…
 3 160612 WaterAid          6.49    7.93 12/04/… Yes     Boreho… Well    Hand P…
 4 160669 WaterAid          6.73    7.65 12/04/… Yes     Boreho… Well    <NA>   
 5 160642 WaterAid          6.78    7.66 12/04/… Yes     Boreho… Well    Hand P…
 6 160628 WaterAid          6.96    7.78 12/04/… Yes     Boreho… Well    Hand P…
 7 160632 WaterAid          7.02    7.84 12/04/… Yes     Boreho… Well    Hand P…
 8 642747 Living Water …    7.33    8.98 10/03/… Yes     Boreho… Well    Mechan…
 9 642456 Living Water …    7.17    9.11 10/03/… Yes     Boreho… Well    Hand P…
10 641347 Living Water …    7.20    9.22 03/28/… Yes     Boreho… Well    Hand P…
# … with 94,998 more rows, 62 more variables: `#water_tech_category` <chr>,
#   `#facility_type` <chr>, `#clean_country_name` <chr>, `#clean_adm1` <chr>,
#   `#clean_adm2` <chr>, `#clean_adm3` <chr>, `#clean_adm4` <chr>,
#   `#install_year` <dbl>, `#installer` <chr>, `#rehab_year` <lgl>,
#   `#rehabilitator` <lgl>, `#management_clean` <chr>, `#status_clean` <chr>,
#   `#pay` <chr>, `#fecal_coliform_presence` <chr>,
#   `#fecal_coliform_value` <dbl>, `#subjective_quality` <chr>, …

Now than we have a tibble data frame we would need to convert the data frame into a sf object using st_sf(). Read more about st_sf() here.

Important

It is important to note that the sfc object in the Geometry column does not contain the correct referencing system. There is a need to transform the projection into a WGS 84. The EPSG code is 4326.

wp_sf <- st_sf(wp_nga, crs=4326)
wp_sf
Simple feature collection with 95008 features and 70 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 2.707441 ymin: 4.301812 xmax: 14.21828 ymax: 13.86568
Geodetic CRS:  WGS 84
# A tibble: 95,008 × 71
   row_id `#source`      #lat_…¹ #lon_…² #repo…³ #stat…⁴ #wate…⁵ #wate…⁶ #wate…⁷
 *  <dbl> <chr>            <dbl>   <dbl> <chr>   <chr>   <chr>   <chr>   <chr>  
 1 429068 GRID3             7.98    5.12 08/29/… Unknown <NA>    <NA>    Tapsta…
 2 222071 Federal Minis…    6.96    3.60 08/16/… Yes     Boreho… Well    Mechan…
 3 160612 WaterAid          6.49    7.93 12/04/… Yes     Boreho… Well    Hand P…
 4 160669 WaterAid          6.73    7.65 12/04/… Yes     Boreho… Well    <NA>   
 5 160642 WaterAid          6.78    7.66 12/04/… Yes     Boreho… Well    Hand P…
 6 160628 WaterAid          6.96    7.78 12/04/… Yes     Boreho… Well    Hand P…
 7 160632 WaterAid          7.02    7.84 12/04/… Yes     Boreho… Well    Hand P…
 8 642747 Living Water …    7.33    8.98 10/03/… Yes     Boreho… Well    Mechan…
 9 642456 Living Water …    7.17    9.11 10/03/… Yes     Boreho… Well    Hand P…
10 641347 Living Water …    7.20    9.22 03/28/… Yes     Boreho… Well    Hand P…
# … with 94,998 more rows, 62 more variables: `#water_tech_category` <chr>,
#   `#facility_type` <chr>, `#clean_country_name` <chr>, `#clean_adm1` <chr>,
#   `#clean_adm2` <chr>, `#clean_adm3` <chr>, `#clean_adm4` <chr>,
#   `#install_year` <dbl>, `#installer` <chr>, `#rehab_year` <lgl>,
#   `#rehabilitator` <lgl>, `#management_clean` <chr>, `#status_clean` <chr>,
#   `#pay` <chr>, `#fecal_coliform_presence` <chr>,
#   `#fecal_coliform_value` <dbl>, `#subjective_quality` <chr>, …

Much like the Handling of the Geospatial data above, there is a need to conver the WGS84 projection to the projection coordinate system of Nigeria as well.

wp_sf <- wp_sf %>%
  st_transform(crs = 26392)

Geospatial Data Cleaning

At this step, we now know that we have already loaded all the dataset and that the next step of it would be cleaning the data.

Excluding Redundent Fields

Taking a look at the columns (NGA Data set (Humanitarian Data Exchange))of the NGA sf dataframe, we could identify most of the redundent fields. The only field that really matters would be the columns

Columns to Keep Reasons
ADM2_EN This is the English Name of the ADM2. This is where the Local Government Area.
ADM2_PCODE This is the unique identifier of ADM2
ADM1_EN This is the English Name of the ADM1. This is where the States of Nigeria is.
ADM1_PCODE This is the unique identifier of ADM1
NGA <- NGA %>%
  select(c(3:4, 8:9))

Checking for Duplicate Name

We need to ensure that there is no duplicate name in the data. In this case, we only really care about checking for duplicate names in Local Government Area or ADM2. One method we can used to check for duplicated name would the used of the duplicated() function. Find out about the duplicated() from base R here.

NGA$ADM2_EN[duplicated(NGA$ADM2_EN)==TRUE]
[1] "Bassa"    "Ifelodun" "Irepodun" "Nasarawa" "Obi"      "Surulere"

Now that we know that there are similarities in the name we would need to examine the duplicate field more closely. One method we can used to check if the duplicated data are the same would be to take a look at their unique pcode.

NGA$ADM2_PCODE[duplicated(NGA$ADM2_PCODE)==TRUE]
character(0)

Now that we have establised that each of ADM2_PCODE is different, we can determined that the ADM2 names are the same but are referencing different area. In this case, there will be a need to correct the names of the ADM2_EN so that there will be no duplicate data.

Tip

A Google Search can be performed as well to double check they are indeed different area.

NGA$ADM2_EN[94] <- "Bassa, Kogi"
NGA$ADM2_EN[95] <- "Bassa, Plateau"
NGA$ADM2_EN[304] <- "Ifelodun, Kwara"
NGA$ADM2_EN[305] <- "Ifelodun, Osun"
NGA$ADM2_EN[355] <- "Irepodun, Kwara"
NGA$ADM2_EN[356] <- "Irepodun, Osun"
NGA$ADM2_EN[519] <- "Nasarawa, Kano"
NGA$ADM2_EN[520] <- "Nasarawa, Nasarawa"
NGA$ADM2_EN[546] <- "Obi, Benue"
NGA$ADM2_EN[547] <- "Obi, Nasarawa"
NGA$ADM2_EN[693] <- "Surulere, Lagos"
NGA$ADM2_EN[694] <- "Surulere, Oyo"

Now, we would need to confirm that the duplicated name issues has been addressed already.

NGA$ADM2_EN[duplicated(NGA$ADM2_EN)==TRUE]
character(0)

Data Wrangling for Water Point Data

Before we go about extracting the relevant details from the Water Point Data, we could perform some Exploratory Data Analysis to gain some initial understanding of the data.

Note

Note that we need to use the sf Dataframe for most of the analysis.

We can view the distribution of the water point through the use of freq() function of funModeling package. Find out about freq() from funModeling here.

freq(data = wp_sf,
     input = '#status_clean')
Warning: The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as
of ggplot2 3.3.4.
ℹ The deprecated feature was likely used in the funModeling package.
  Please report the issue at <https://github.com/pablo14/funModeling/issues>.

                     #status_clean frequency percentage cumulative_perc
1                       Functional     45883      48.29           48.29
2                   Non-Functional     29385      30.93           79.22
3                             <NA>     10656      11.22           90.44
4      Functional but needs repair      4579       4.82           95.26
5 Non-Functional due to dry season      2403       2.53           97.79
6        Functional but not in use      1686       1.77           99.56
7         Abandoned/Decommissioned       234       0.25           99.81
8                        Abandoned       175       0.18           99.99
9 Non functional due to dry season         7       0.01          100.00

Let perform some analysis on the status of the water points. There seems to be 3 broad categories of the water point based on their status:

  • functional

  • non-functional

  • unknown.

However, some data wrangling task would need to be performed in order to make it easier to handle in subsequent steps

wp_sf_nga <- wp_sf %>% 
  rename(status_clean = '#status_clean') %>%
  select(status_clean) %>%
  mutate(status_clean = replace_na(
    status_clean, "unknown"))

Extracting the Water Point Data

With some basic understanding of the water point, we can now categories the water point data. From the graph above we can categories it into the following method.

This is to extract functional water point
Waterpoint Category Status
Functional Functional
Functional Functional but not in use
Functional Functional but needs repair
Non Functional Abandoned/Decommissioned
Non Functional Abandoned
Non Functional Non-Functional due to dry season
Non Functional Non-Functional
Non Functional Non functional due to dry season
Unknown Unknown
wp_functional <- wp_sf_nga %>%
  filter(status_clean %in%
           c("Functional",
             "Functional but not in use",
             "Functional but needs repair"))
wp_functional
Simple feature collection with 52148 features and 1 field
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 29322.63 ymin: 33758.37 xmax: 1218553 ymax: 1092629
Projected CRS: Minna / Nigeria Mid Belt
# A tibble: 52,148 × 2
   status_clean            Geometry
 * <chr>                <POINT [m]>
 1 Functional   (128394.3 330487.9)
 2 Functional   (464684.4 94532.59)
 3 Functional   (588792.3 74102.03)
 4 Functional     (459153.3 171705)
 5 Functional   (586703.9 75701.92)
 6 Functional      (612461.7 87149)
 7 Functional     (503439 87320.23)
 8 Functional   (599467.7 92205.82)
 9 Functional   (651470.8 101586.9)
10 Functional   (650819.3 104796.9)
# … with 52,138 more rows

This is to extract nonfunctional water point.

wp_nonfunctional <- wp_sf_nga %>%
  filter(status_clean %in%
           c("Abandoned/Decommissioned",
             "Abandoned",
             "Non-Functional due to dry season",
             "Non-Functional",
             "Non functional due to dry season"))
wp_nonfunctional
Simple feature collection with 32204 features and 1 field
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 28907.91 ymin: 33736.93 xmax: 1209690 ymax: 1092883
Projected CRS: Minna / Nigeria Mid Belt
# A tibble: 32,204 × 2
   status_clean                        Geometry
 * <chr>                            <POINT [m]>
 1 Abandoned/Decommissioned (578642.2 141523.1)
 2 Abandoned/Decommissioned (571655.4 70856.98)
 3 Abandoned/Decommissioned   (571629.5 143544)
 4 Abandoned/Decommissioned (608748.8 141693.1)
 5 Abandoned/Decommissioned (576876.2 66860.76)
 6 Abandoned/Decommissioned   (698288 224655.8)
 7 Abandoned/Decommissioned (698293.1 224809.4)
 8 Abandoned/Decommissioned (341287.7 459644.6)
 9 Abandoned/Decommissioned (402193.2 89488.33)
10 Abandoned/Decommissioned (589410.8 147917.3)
# … with 32,194 more rows

This is to extract unknown water point

wp_unknown <- wp_sf_nga %>%
  filter(status_clean == "unknown")
wp_unknown
Simple feature collection with 10656 features and 1 field
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 29143.21 ymin: 36660.5 xmax: 1293293 ymax: 965811.9
Projected CRS: Minna / Nigeria Mid Belt
# A tibble: 10,656 × 2
   status_clean            Geometry
 * <chr>                <POINT [m]>
 1 unknown      (297874.6 441473.8)
 2 unknown      (607559.4 274905.5)
 3 unknown      (576523.1 301556.6)
 4 unknown      (578321.7 307339.8)
 5 unknown      (590994.2 326738.8)
 6 unknown      (597909.2 333608.5)
 7 unknown      (724171.9 367609.1)
 8 unknown      (737994.1 350616.5)
 9 unknown      (749790.1 354304.6)
10 unknown      (728109.9 367079.1)
# … with 10,646 more rows

EDA on waterpoints

Now that we have extracted the water point data, we can have a better look at water points from each category.

This is for Functional Water point

freq(data = wp_functional,
     input = 'status_clean')

                 status_clean frequency percentage cumulative_perc
1                  Functional     45883      87.99           87.99
2 Functional but needs repair      4579       8.78           96.77
3   Functional but not in use      1686       3.23          100.00

This is for Non Functional Water Point

freq(data = wp_nonfunctional,
     input = 'status_clean')

                      status_clean frequency percentage cumulative_perc
1                   Non-Functional     29385      91.25           91.25
2 Non-Functional due to dry season      2403       7.46           98.71
3         Abandoned/Decommissioned       234       0.73           99.44
4                        Abandoned       175       0.54           99.98
5 Non functional due to dry season         7       0.02          100.00

This is Unknown Water Point

freq(data = wp_unknown,
     input = 'status_clean')

  status_clean frequency percentage cumulative_perc
1      unknown     10656        100             100

Performing Point In Polygon Count.

While knowing the number of total functional and nonfunctional and unknown water point is important, it would be better if we are able to see the status of each water point in each of the LGA.

To do that we would need to perform a series of steps.

All of this is added to a new sf data frame “NGA_wp” to be used in subsequent steps.

NGA_wp <- NGA %>% 
  mutate(`total_wp` = lengths(
    st_intersects(NGA, wp_sf_nga))) %>%
  mutate(`wp_functional` = lengths(
    st_intersects(NGA, wp_functional))) %>%
  mutate(`wp_nonfunctional` = lengths(
    st_intersects(NGA, wp_nonfunctional))) %>%
  mutate(`wp_unknown` = lengths(
    st_intersects(NGA, wp_unknown)))

Visualising Attributes

Now that we have the point in polygon count we can reveal the distribution of the waterpoint.

We can make use of the ggplot2 packages to help plot. ggplot2 is part of tidyverse, and more information can be found here.

Distribution of Total Water point

ggplot(data = NGA_wp,
       aes(x = wp_functional)) + 
  geom_histogram(bins=20,
                 color="black",
                 fill="light blue") +
  geom_vline(aes(xintercept=mean(
    total_wp, na.rm=T)),
             color="red", 
             linetype="dashed", 
             size=0.8) +
  ggtitle("Distribution of total water points by LGA") +
  xlab("No. of water points") +
  ylab("No. of\nLGAs") +
  theme(axis.title.y=element_text(angle = 0))
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.

Distribution of Functional Water point

ggplot(data = NGA_wp,
       aes(x = wp_nonfunctional)) + 
  geom_histogram(bins=20,
                 color="black",
                 fill="light blue") +
  geom_vline(aes(xintercept=mean(
    total_wp, na.rm=T)),
             color="red", 
             linetype="dashed", 
             size=0.8) +
  ggtitle("Distribution of total water points by LGA") +
  xlab("No. of water points") +
  ylab("No. of\nLGAs") +
  theme(axis.title.y=element_text(angle = 0))

Distribution of Non Functional Water point

ggplot(data = NGA_wp,
       aes(x = total_wp)) + 
  geom_histogram(bins=20,
                 color="black",
                 fill="light blue") +
  geom_vline(aes(xintercept=mean(
    total_wp, na.rm=T)),
             color="red", 
             linetype="dashed", 
             size=0.8) +
  ggtitle("Distribution of total water points by LGA") +
  xlab("No. of water points") +
  ylab("No. of\nLGAs") +
  theme(axis.title.y=element_text(angle = 0))

Distribution of Unknown Water point

ggplot(data = NGA_wp,
       aes(x = wp_unknown)) + 
  geom_histogram(bins=20,
                 color="black",
                 fill="light blue") +
  geom_vline(aes(xintercept=mean(
    total_wp, na.rm=T)),
             color="red", 
             linetype="dashed", 
             size=0.8) +
  ggtitle("Distribution of total water points by LGA") +
  xlab("No. of water points") +
  ylab("No. of\nLGAs") +
  theme(axis.title.y=element_text(angle = 0))

Saving the Data into RDS format

Now that we have done all the above work in order to get sf object structure for us to perform geospatial analytics, it would be a shame if all the hardwork is gone. As such, it is important that we save the data into rds format.

We can make use of the write_rds() function from readr package to export the sf dataframe into a rds format. Find out more about write_rds() function from readr package here.

Note

RDS format is a R native data format, which allows for R object to be save for later used.

write_rds(NGA_wp, "data/rds/NGA_wp.rds")

Mapping the Functional and Non Functional Points

Now that we have save the Data in RDS, why not we plot the relevant data into an interactive map for you to see.

We are making use of tmap package to help us in the plotting of the map. Find out more about tmap package here.

tmap_mode("view")
tmap mode set to interactive viewing
tm_shape(NGA) +
  tm_polygons() +
tm_shape(wp_functional)+ 
  tm_dots(col = "status_clean",
             size = 0.01,
             border.col = "black",
             border.lwd = 0.5,
          palette = "blue") + 
  tm_shape(wp_nonfunctional)+ 
  tm_dots(col = "status_clean",
             size = 0.01,
             border.col = "black",
             border.lwd = 0.5,
          palette = "red") +
  tm_view(set.zoom.limits = c(6, 10))
Note

After setting tmap_mode to view. We need to set the tmap_mode back to plot

tmap_mode("plot")
tmap mode set to plotting

In the event that the above map, takes too long to load, you can view the entire map of Nigeria here.

Note

This is generated with tmap as well however, the mode is set to plot, hence a static map is generated.

tm_shape(NGA) +
  tm_borders(col = "grey40", lwd = 1, lty = "solid")+
tm_shape(wp_functional)+
  tm_dots(legend.show = TRUE, col = "blue") +
  tm_shape(wp_nonfunctional) +
  tm_dots(legend.show = TRUE, col = "red") 

Geospatial Data Wrangling

Converting sf data to sp’s Spatial Class

As sp’s Spatial Class is still commonly used by most of the spatial analysis packages as such we would need to convert the data frame into Spatial Class.

This can be done with the function as_Spatial() function of the sf package. Find out more about as_Spatial() from sf package here.

wp_functional_spatial <- as_Spatial(wp_functional)
wp_nonfunctional_spatial <- as_Spatial(wp_nonfunctional)
NGA_spatial <- as_Spatial(NGA)

Viewing each of the sp Spatial Class

Here we will view the Spatial Class of the converted data frame.

Functional

wp_functional_spatial 
class       : SpatialPointsDataFrame 
features    : 52148 
extent      : 29322.63, 1218553, 33758.37, 1092629  (xmin, xmax, ymin, ymax)
crs         : +proj=tmerc +lat_0=4 +lon_0=8.5 +k=0.99975 +x_0=670553.98 +y_0=0 +a=6378249.145 +rf=293.465 +towgs84=-92,-93,122,0,0,0,0 +units=m +no_defs 
variables   : 1
names       :              status_clean 
min values  :                Functional 
max values  : Functional but not in use 

Non Functional

wp_nonfunctional_spatial
class       : SpatialPointsDataFrame 
features    : 32204 
extent      : 28907.91, 1209690, 33736.93, 1092883  (xmin, xmax, ymin, ymax)
crs         : +proj=tmerc +lat_0=4 +lon_0=8.5 +k=0.99975 +x_0=670553.98 +y_0=0 +a=6378249.145 +rf=293.465 +towgs84=-92,-93,122,0,0,0,0 +units=m +no_defs 
variables   : 1
names       :                     status_clean 
min values  :                        Abandoned 
max values  : Non functional due to dry season 

NGA

NGA_spatial
class       : SpatialPolygonsDataFrame 
features    : 774 
extent      : 26662.71, 1344157, 30523.38, 1096029  (xmin, xmax, ymin, ymax)
crs         : +proj=tmerc +lat_0=4 +lon_0=8.5 +k=0.99975 +x_0=670553.98 +y_0=0 +a=6378249.145 +rf=293.465 +towgs84=-92,-93,122,0,0,0,0 +units=m +no_defs 
variables   : 4
names       :   ADM2_EN, ADM2_PCODE, ADM1_EN, ADM1_PCODE 
min values  : Aba North,   NG001001,    Abia,      NG001 
max values  :      Zuru,   NG037014, Zamfara,      NG037 

Converting the Spatial Class into generic sp format

The main package that we are using to analyse out data would be spatstat. However, this requires our analytical data to be in the ppp object form.

However, there is no direct way for us to convert a Spatial class into a ppp object. It would need to be converted into a Spatial object first before we can convert it into a ppp object.

We can make use of the as() function from basic R to convert it into a Spatial Object. Find out more about as() function here.

Note

Note that we are converting the point into Spatial Points, while we are converting the boundaries into “Spatial Polygons”

wp_functional_sp <- as(wp_functional_spatial, "SpatialPoints")
wp_nonfunctional_sp <- as(wp_nonfunctional_spatial, "SpatialPoints")
NGA_sp <- as(NGA_spatial, "SpatialPolygons")

Viewing each of the Spatial Points

Functional

wp_functional_sp 
class       : SpatialPoints 
features    : 52148 
extent      : 29322.63, 1218553, 33758.37, 1092629  (xmin, xmax, ymin, ymax)
crs         : +proj=tmerc +lat_0=4 +lon_0=8.5 +k=0.99975 +x_0=670553.98 +y_0=0 +a=6378249.145 +rf=293.465 +towgs84=-92,-93,122,0,0,0,0 +units=m +no_defs 

Non Functional

wp_nonfunctional_sp
class       : SpatialPoints 
features    : 32204 
extent      : 28907.91, 1209690, 33736.93, 1092883  (xmin, xmax, ymin, ymax)
crs         : +proj=tmerc +lat_0=4 +lon_0=8.5 +k=0.99975 +x_0=670553.98 +y_0=0 +a=6378249.145 +rf=293.465 +towgs84=-92,-93,122,0,0,0,0 +units=m +no_defs 

NGA

NGA_sp
class       : SpatialPolygons 
features    : 774 
extent      : 26662.71, 1344157, 30523.38, 1096029  (xmin, xmax, ymin, ymax)
crs         : +proj=tmerc +lat_0=4 +lon_0=8.5 +k=0.99975 +x_0=670553.98 +y_0=0 +a=6378249.145 +rf=293.465 +towgs84=-92,-93,122,0,0,0,0 +units=m +no_defs 

Converting into spatstas ppp format

We can now convert the Spatial Object into a ppp format

wp_functional_ppp <- as(wp_functional_sp , "ppp")
wp_nonfunctional_ppp <- as(wp_nonfunctional_sp , "ppp")

Viewing each ppp format

Functional

wp_functional_ppp
Planar point pattern: 52148 points
window: rectangle = [29322.6, 1218553.3] x [33758.4, 1092628.9] units

Non Functional

wp_nonfunctional_ppp
Planar point pattern: 32204 points
window: rectangle = [28907.9, 1209690] x [33736.9, 1092882.6] units

Checking for Duplicate Points

Note

It is important to check for Duplicate Points, and there are many methods of handling duplicate points such as rjitter().

We would now need to perform a check to make sure that there is no duplicated points. which we can check using the

any(duplicated(wp_functional_ppp))
[1] FALSE
any(duplicated(wp_nonfunctional_ppp))
[1] FALSE

Since the result is False, it seems that there is no special need to perform other actions to fix the duplicated data.

Creating Owin Object

Note

An owin object is used to define the polygonal region of the Region of interest.

We will now wish to confine the geographical area boundary to that of Nigeria, as such we can convert the Spatial Polygon object into owin to help us represent this polygonal region.

NGA_owin <- as(NGA_sp, "owin")

Plotting the Owin

plot(NGA_owin)

Combining Non Functional Water Point and Functional Water Point with Owin

Now then we can extract the water point events and combine it with the owin data.

wp_functional_NGA_ppp = wp_functional_ppp[NGA_owin]
wp_nonfunctional_NGA_ppp = wp_nonfunctional_ppp[NGA_owin]

Plotting the Owin Object with Water Points

Now we will plot the owin object to see if we it was successful.

Functional

plot(wp_functional_NGA_ppp)

Non Functional

plot(wp_nonfunctional_NGA_ppp)

Kernel Density Estimation

As you can see from the owin map data above, it appears it is extremely difficult to gain any insights from the map above. Based on how dark the region, we can assume that there is where most of the waterpoint are, but there is a much better map we can use and that would be Kernal Density Map.

As such we would need perform Kernel Density Estimation to measure the intensity of the point process.

Rescaling the KDE values

One issue with the ppp data is that the unit of measurement is in meters, which will cause out density values to be very small.

As such there is a needed to convert the unit measurement into kilometer in order for the density to make better sense.

wp_functional_NGA_ppp.km <- rescale(wp_functional_NGA_ppp, 1000, "km")
wp_nonfunctional_NGA_ppp.km <- rescale(wp_nonfunctional_NGA_ppp, 1000, "km")

Plotting the Kernal Density Map

Now that we have rescaled the data set we can perform Kernel Density Estimation on the datasets.

We can make use of the density() function from the spatstat package to help us generate the Kernal Density map. Find out more about density() from spatstat here., We will making use of the automatic bandwidth methods here.

Note

One thing to note is that there are a 3 different spatstat function for us to use to determine the bandwidth, called bw.CvL, bw.scott and bw.ppl. And also kernel needs to be determined as well (I will be using the default “Gaussian”.

According to research by Prof, it was suggested that bw.ppl() is more appropriate to use, when patterns consist predominantly of tight cluster. bw.diggle() is used when the we are trying to detect a single tight cluster in the midst of random noise. However, both are more commonly use.

I have chosen to use diggle(), but results might vary if bw.ppl() is used.

Functional Kernal Density Map

wp_functional_NGA.bw <- density(wp_functional_NGA_ppp.km, sigma=bw.diggle, edge=TRUE, kernel="gaussian")
plot(wp_functional_NGA.bw)

Non Functional Kernal Density Map

wp_nonfunctional_NGA.bw <- density(wp_nonfunctional_NGA_ppp.km, sigma=bw.diggle, edge=TRUE, kernel="gaussian")
plot(wp_nonfunctional_NGA.bw)

Converting to Grid Object

We would need to convert the KDE into a suitable format for mapping purpose. The format we are converting would be a Grid Object

Functional

gridded_kde_wp_functional_NG_bw <- as.SpatialGridDataFrame.im(wp_functional_NGA.bw)
spplot(gridded_kde_wp_functional_NG_bw)

Non Functional

gridded_kde_wp_nonfunctional_NG_bw <- as.SpatialGridDataFrame.im(wp_nonfunctional_NGA.bw)
spplot(gridded_kde_wp_nonfunctional_NG_bw)

Converting to Raster

After we have converted the data into a Grid Object we would need to convert that into a Raster Layer. We can make use of raster() function of raster package to help us convert. Find out more about raster package here.

Raster object is accepted by tmap as one of the layer for plotting. as such we will be converting it into a raster object

kde_wp_functional_NG_bw_raster <- raster(gridded_kde_wp_functional_NG_bw)
kde_wp_nonfunctional_NG_raster <- raster(gridded_kde_wp_nonfunctional_NG_bw)
Note

However, a raster object does not have any CRS information. We would need to project the CRS information into the raster layer. In this case the EPSG code is 26392.

projection(kde_wp_functional_NG_bw_raster) <- CRS("+init=EPSG:26392")
projection(kde_wp_nonfunctional_NG_raster) <- CRS("+init=EPSG:26392")

Viewing Raster Object

Functional

kde_wp_functional_NG_bw_raster
class      : RasterLayer 
dimensions : 128, 128, 16384  (nrow, ncol, ncell)
resolution : 10.29292, 8.324266  (x, y)
extent     : 26.66271, 1344.157, 30.52338, 1096.029  (xmin, xmax, ymin, ymax)
crs        : +init=EPSG:26392 
source     : memory
names      : v 
values     : -4.383912e-16, 4.843545  (min, max)

Non Functional

kde_wp_nonfunctional_NG_raster
class      : RasterLayer 
dimensions : 128, 128, 16384  (nrow, ncol, ncell)
resolution : 10.29292, 8.324266  (x, y)
extent     : 26.66271, 1344.157, 30.52338, 1096.029  (xmin, xmax, ymin, ymax)
crs        : +init=EPSG:26392 
source     : memory
names      : v 
values     : -2.410108e-16, 1.517255  (min, max)

Plotting Kernal Density Map of Whole Nigeria

Now that we have the Kernal Density Map of the whole Nigeria, we can plot it using tmap

Functional

tm_shape(kde_wp_functional_NG_bw_raster) + 
    tm_layout(main.title = "KDE of Functional Water Point (NGE)") +
  tm_raster("v") +
  tm_layout(legend.position = c("right", "bottom"), frame = FALSE)
Variable(s) "v" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.

Non Functional

tm_shape(kde_wp_nonfunctional_NG_raster) + 
    tm_layout(main.title = "KDE of Non Functional Water Point (NGE)") +
  tm_raster("v") +
  tm_layout(legend.position = c("right", "bottom"), frame = FALSE)
Variable(s) "v" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.

Plotting Kernal Density Map of Osun Area

Note

Most of the Steps here can be found above.

Extracting Osun State from NGA

We can now perform the same steps as above to generate the Kernal Density Map of Osun Area.

osun = NGA_spatial[NGA_spatial@data$ADM1_EN == "Osun",]
plot(osun)

osun_sp = as(osun, "SpatialPolygons")
osun_owin = as(osun_sp, "owin")

Merging Water Data point with Osun Owin

wp_functional_osun_ppp = wp_functional_ppp[osun_owin]
wp_nonfunctional_osun_ppp = wp_nonfunctional_ppp[osun_owin]

Rescaling Data Points

wp_functional_osun_ppp.km = rescale(wp_functional_osun_ppp, 1000, "km")
wp_nonfunctional_osun_ppp.km = rescale(wp_nonfunctional_osun_ppp, 1000, "km")

We will plot a map of the functional water point of osun to take a look.

plot(wp_functional_osun_ppp.km)

Generating KDE

wp_nonfunctional_osun.bw <- density(wp_nonfunctional_osun_ppp.km, sigma=bw.diggle, edge=TRUE, kernel="gaussian")
plot(wp_nonfunctional_osun.bw)

wp_functional_osun.bw <- density(wp_functional_osun_ppp.km, sigma=bw.diggle, edge=TRUE, kernel="gaussian")
plot(wp_functional_osun.bw)

Converting KDE to Raster

gridded_kde_wp_functional_osun_bw <- as.SpatialGridDataFrame.im(wp_functional_osun.bw)
spplot(gridded_kde_wp_functional_osun_bw)

gridded_kde_wp_nonfunctional_osun_bw <- as.SpatialGridDataFrame.im(wp_nonfunctional_osun.bw)
spplot(gridded_kde_wp_nonfunctional_osun_bw)

kde_wp_functional_osun_bw_raster <- raster(gridded_kde_wp_functional_osun_bw)
kde_wp_nonfunctional_osun_bw_raster <- raster(gridded_kde_wp_nonfunctional_osun_bw)
projection(kde_wp_functional_osun_bw_raster) <- CRS("+init=EPSG:26393")
projection(kde_wp_nonfunctional_osun_bw_raster) <- CRS("+init=EPSG:26393")
kde_wp_functional_osun_bw_raster
class      : RasterLayer 
dimensions : 128, 128, 16384  (nrow, ncol, ncell)
resolution : 0.8948485, 0.9616045  (x, y)
extent     : 176.5032, 291.0438, 331.4347, 454.5201  (xmin, xmax, ymin, ymax)
crs        : +init=EPSG:26393 
source     : memory
names      : v 
values     : -4.876249e-15, 25.49435  (min, max)

Plotting KDE of Osun

tm_shape(kde_wp_functional_osun_bw_raster) + 
    tm_layout(main.title = "KDE of Functional Water Point (Osun)") +
  tm_raster("v") +
  tm_layout(legend.position = c("right", "bottom"), frame = FALSE)
Variable(s) "v" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.

tm_shape(kde_wp_nonfunctional_osun_bw_raster) + 
    tm_layout(main.title = "KDE of Non Functional Water Point (Osun)") +
  tm_raster("v") +
  tm_layout(legend.position = c("right", "bottom"), frame = FALSE)
Variable(s) "v" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.

Analysis of Kernal Density Points

Testing for Distribution Clarks and Evens

Note

Clarks and Evants Test is a Test of Aggregation, Further Testing would need to be performed in order for us to determine if the Hypothesis is True. Find out more about clarksevans.test() here.

The test hypotheses are:

Ho = The distribution of water points in Osun are randomly distributed.

H1= The distribution of water points in Osun are not randomly distributed.

The 95% confident interval will be used.

clarkevans.test(wp_functional_osun_ppp,
                correction="none",
                clipregion="sg_owin",
                alternative=c("clustered"),
                nsim=99)

    Clark-Evans test
    No edge correction
    Monte Carlo test based on 99 simulations of CSR with fixed n

data:  wp_functional_osun_ppp
R = 0.44767, p-value = 0.01
alternative hypothesis: clustered (R < 1)
Note

We therefore reject the null Hypothesis that the Functional water point are randomly distributed as the p-value is greater than 0.01.

clarkevans.test(wp_nonfunctional_osun_ppp,
                correction="none",
                clipregion="sg_owin",
                alternative=c("clustered"),
                nsim=99)

    Clark-Evans test
    No edge correction
    Monte Carlo test based on 99 simulations of CSR with fixed n

data:  wp_nonfunctional_osun_ppp
R = 0.44327, p-value = 0.01
alternative hypothesis: clustered (R < 1)
Note

We therefore reject the null Hypothesis that the non Functional water point are randomly distributed as the p-value is greater than 0.01.

Analysis of Functional and Non Functional Water point

When looking at the 2 Kernal Density Map, it is easy to assume that there seems to be almost no pattern on the density of the water point, however, one can easily assume that there is no pattern in where all the water points are located.

However upon further inspection, there seems to be 2 trends that can be spotted based in the difference in map density. In other to help out in the spotting of the trend, a map of Osun from Google map was include below for the trend to be more easily view able.

Areas with High Functional Water Point and Non Functional Water Point are clustered together in the cities in Osun.

Copyrighted: Taken from Google Maps

Taking a look at the map above and comparing it with the KDE Map, areas with high density of functional and non functional water points seems to coincide with where the cities are located. This seems to make the most sense as cities are where most people lives and need the access to water, as such it make sense most of the water point to concentrate itself in the cities.

In this regards, Large cities tend to have a larger concentration of functional and non functional water points as compared to smaller cities. On the other hand, areas of very low human density such as the Both Nature Reserve seem to have almost no Water Points whether they are functional or not.

Important

Based on this observations, we can hypothesis that both the Functional Water Point and Non Functional Water Point are clustered together in the state of Osun. Further Validation using Second Spatial Point Pattern Analysis would need to be performed.

Functional and Non Functional Water Point are clustered together in cities of Osun

Functional Water Point KDE

Non Functional Water Point KDE

We have already establish in the previous observation that Most of the Water Points in the Cities. When we observed the density of the Functional Water Points and Non Functional Water Points in the state, we come to a hypothesis that the water points are clustered together in the State of Osun. However, if we were to inspect the KDE of both water points more closely focusing on the cities, we have reason to believe that the functional water and non functional water points are clustered together among themselves in the cities as well.

The 2 images above are captures of the cities of Ede. Despite both being in the same city, the water points appeared to be clustered together as well. This can be observed in other points in the Functional Water Point KDE and Non Functional Water Point KDE as well.

Important

Based on this observations, we can hypothesis that both the Functional Water Point and Non Functional Water Point are clustered together in the cities of the state of Osun. Further Validation using Second Spatial Point Pattern Analysis would need to be performed.

Comparison between Kernel Density Map and Point Map.

Kernal Density Map are less computationally intensive to display as compared to Point Map in an interactive map

A Point Map with it thousands of points is more computationally intensive to display are compared to Kernal Density Map. This is because the computer would need to take note of every individual point and plot it out which would be an issue for computers with low computational power.

Kernel Density Map is able to provide an estimation of the concentration of points.

Kernel Density takes into account the inverse-distance weighted counts of the existing point to estimate the concentration of points in an given area. This means that the estimated cells values are derived based on the weights of the existing points, with the furthest points having the lowest weights. This process in turns smooth the density generated and can help to give a more pronounced gradient.

Find out more about Inverse Distance Weighted here.

2nd Order Spatial Point Analysis

Now that we have analysed the spatial point patterns, we would need to confirm our observations statistically - which is where hypothesis comes in.

Our Hypothesis:

  • H0: The distribution of the Functional Water Points are randomly distributed

  • H1: The distribution of the Non Functional Water Points are not randomly distributed

  • Confidence level : 99%

We will be performing the Hypothesis Testing for 3 different cities that I have selected and that would be the Osogbo, Ede and Iwo. We will also be performing it on the State of Osun.

Before that we would need to extract the water points data from the 3 cities as well, following the same steps we did to extract the data of the Osun State.

Defining the Areas

iwo = NGA_spatial[NGA_spatial@data$ADM2_EN == "Iwo",] 
osogbo = NGA_spatial[NGA_spatial@data$ADM2_EN == "Osogbo",]
ede = NGA_spatial[NGA_spatial@data$ADM2_EN %in% c("Ede North","Ede South"),]
iwo_sp = as(iwo, "SpatialPolygons")
osogbo_sp = as(osogbo, "SpatialPolygons")
ede_sp = as(ede, "SpatialPolygons")
iwo_owin = as(iwo, "owin")
osogbo_owin = as(osogbo, "owin")
ede_owin = as(ede, "owin")
wp_functional_iwo_ppp = wp_functional_ppp[iwo_owin]
wp_nonfunctional_iwo_ppp = wp_nonfunctional_ppp[iwo_owin]

wp_functional_osogbo_ppp = wp_functional_ppp[osogbo_owin]
wp_nonfunctional_osogbo_ppp = wp_nonfunctional_ppp[osogbo_owin]

wp_functional_ede_ppp = wp_functional_ppp[ede_owin]
wp_nonfunctional_ede_ppp = wp_nonfunctional_ppp[ede_owin]
wp_functional_iwo_ppp.km = rescale(wp_functional_iwo_ppp, 1000, "km")
wp_nonfunctional_iwo_ppp.km = rescale(wp_nonfunctional_iwo_ppp, 1000, "km")

wp_functional_osogbo_ppp.km = rescale(wp_functional_osogbo_ppp, 1000, "km")
wp_nonfunctional_osogbo_ppp.km = rescale(wp_nonfunctional_osogbo_ppp, 1000, "km")

wp_functional_ede_ppp.km = rescale(wp_functional_ede_ppp, 1000, "km")
wp_nonfunctional_ede_ppp.km = rescale(wp_nonfunctional_ede_ppp, 1000, "km")

Computing G Function

Now that we have extracted the dataset for the 3 cities and converted it into ppp format we can now perform our 2nd Order Spatial Analysis on them.

For this we have chose to use G Function to test our Hypothesis. As stated by our Professor, G Function is a measure of the distribution from an arbitrary event to its nearest event. It estimates the nearest neighbour distance distribution from a point pattern in a window of arbitrary shape. It is recommended that G Function is a useful statistic summarising one aspect of the ``clustering’’ of points. As such we will make use of G Function to test our Hypothesis on whether the points are randomly distributed or not.

For us to perform G Function Analysis we will be making use of Gest() Function from spatstat. Find out more about Gest() from spatstat here. To further confirm our Hypothesis we would need to perform a Monte Carlo Simulation Test with G Function to test our Hypothesis.

Iwo

Computing G_Function Estimation for Functional Water Point

G_iwo = Gest(wp_functional_iwo_ppp.km, correction = "border")
plot(G_iwo)

Testing the Hypothesis of Functional Water Point

Our Hypothesis:

  • H0: The distribution of the Functional Water Points in Iwo are randomly distributed

  • H1: The distribution of the Functional Water Points in Iwo are not randomly distributed

  • Confidence level : 99%

  • Significance level : 0.01

  • The null hypothesis will be rejected if p-value is smaller than alpha value of 0.01.

G_iwo_fuctional.csr <- envelope(wp_functional_iwo_ppp.km, Gest, nsim=100)
Generating 100 simulations of CSR  ...
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80,
81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99,  100.

Done.
plot(G_iwo_fuctional.csr)

Important

Conclusion: The observed G(r) is far above the G(theo) as well as the envelope - indicating that Functional Water Point in the Iwo area are clustered. Hence, we reject the null hypothesis that Functional Water Point in the Iwo area are randomly distributed at 99% confident interval.

Computing G_Fuction Estimate for Non Functional Water Point

G_iwo_non = Gest(wp_nonfunctional_iwo_ppp.km, correction = "border")
plot(G_iwo_non)

Testing the Hypothesis of Non Functional Water Point

Our Hypothesis:

  • H0: The distribution of the Non Functional Water Points in Iwo are randomly distributed

  • H1: The distribution of the Non Functional Water Points in Iwo are not randomly distributed

  • Confidence level : 99%

  • Significance level : 0.01

  • The null hypothesis will be rejected if p-value is smaller than alpha value of 0.01.

G_iwo_nonfuctional.csr <- envelope(wp_nonfunctional_iwo_ppp.km, Gest, nsim=100)
Generating 100 simulations of CSR  ...
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80,
81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99,  100.

Done.
plot(G_iwo_nonfuctional.csr)

Important

Conclusion: The observed G(r) is far above the G(theo) as well as the envelope - indicating that Non Functional Water Point in the Iwo area are clustered. Hence, we reject the null hypothesis that Non Functional Water Point in the Iwo area are randomly distributed at 99% confident interval.

Osogbo

Computing G-Fuction Estimate of Functional Water Point

G_osogbo = Gest(wp_functional_osogbo_ppp.km, correction = "border")
plot(G_osogbo)

Testing the Hypothesis of Functional Water Point

Our Hypothesis:

  • H0: The distribution of the Functional Water Points in Osogbo are randomly distributed

  • H1: The distribution of the Functional Water Points in Osogbo are not randomly distributed

  • Confidence level : 99%

  • Significance level : 0.01

  • The null hypothesis will be rejected if p-value is smaller than alpha value of 0.01.

G_osogbo_fuctional.csr <- envelope(wp_functional_osogbo_ppp.km, Gest, nsim=100)
Generating 100 simulations of CSR  ...
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80,
81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99,  100.

Done.
plot(G_osogbo_fuctional.csr)

Important

Conclusion: The observed G(r) is far above the G(theo) as well as the envelope - indicating that Functional Water Point in the Osogbo area are clustered. Hence, we reject the null hypothesis that Functional Water Point in the Osogbo area are randomly distributed at 99% confident interval.

Computing G_Function Estimate for Non Functional Water Point

G_osogbo_non = Gest(wp_nonfunctional_osogbo_ppp.km, correction = "border")
plot(G_osogbo_non)

Testing the Hypothesis of Non Functional Water Point

Our Hypothesis:

  • H0: The distribution of the Non Functional Water Points in Osogbo are randomly distributed

  • H1: The distribution of the Non Functional Water Points in Osogbo are not randomly distributed

  • Confidence level : 99%

  • Significance level : 0.01

  • The null hypothesis will be rejected if p-value is smaller than alpha value of 0.01.

G_osogbo_nonfuctional.csr <- envelope(wp_nonfunctional_osogbo_ppp.km, Gest, nsim=100)
Generating 100 simulations of CSR  ...
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80,
81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99,  100.

Done.
plot(G_osogbo_nonfuctional.csr)

Important

Conclusion: The observed G(r) is far above the G(theo) as well as the envelope - indicating that Non Functional Water Point in the Osobo area are clustered. Hence, we reject the null hypothesis that Non Functional Water Point in the Osobo area are randomly distributed at 99% confident interval.

Ede

Computing G_Function Estimate for Functional Water Point

G_ede = Gest(wp_functional_ede_ppp.km, correction = "border")
plot(G_ede)

Testing the Hypothesis of Functional Water Point

Our Hypothesis:

  • H0: The distribution of the Functional Water Points in Ede are randomly distributed

  • H1: The distribution of the Functional Water Points in Ede are not randomly distributed

  • Confidence level : 99%

  • Significance level : 0.01

  • The null hypothesis will be rejected if p-value is smaller than alpha value of 0.01.

G_ede_fuctional.csr <- envelope(wp_functional_ede_ppp.km, Gest, nsim=100)
Generating 100 simulations of CSR  ...
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80,
81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99,  100.

Done.
plot(G_ede_fuctional.csr)

Important

Conclusion: The observed G(r) is far above the G(theo) as well as the envelope - indicating that Functional Water Point in the Ede area are clustered. Hence, we reject the null hypothesis that Functional Water Point in the Ede area are randomly distributed at 99% confident interval.

Computing G_Function Estimate for Non Functional Water Point

G_ede_non = Gest(wp_nonfunctional_ede_ppp.km, correction = "border")
plot(G_ede_non)

Testing the Hypothesis of Non Functional Water Point

Our Hypothesis:

  • H0: The distribution of the Non Functional Water Points in Ede are randomly distributed

  • H1: The distribution of the Non Functional Water Points in Ede are not randomly distributed

  • Confidence level : 99%

  • Significance level : 0.01

  • The null hypothesis will be rejected if p-value is smaller than alpha value of 0.01.

G_ede_nonfuctional.csr <- envelope(wp_nonfunctional_ede_ppp.km, Gest, nsim=100)
Generating 100 simulations of CSR  ...
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80,
81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99,  100.

Done.
plot(G_ede_nonfuctional.csr)

Important

Conclusion: The observed G(r) is far above the G(theo) as well as the envelope - indicating that Non Functional Water Point in the Ede area are clustered. Hence, we reject the null hypothesis that Non Functional Water Point in the Ede area are randomly distributed at 99% confident interval.

Osun

Computing G_Function Estimate for Functional Water Point

G_osun = Gest(wp_functional_osun_ppp.km, correction = "border") 
plot(G_osun)

Testing the Hypothesis for Functional Water Point

Our Hypothesis:

  • H0: The distribution of the Functional Water Points in Osun are randomly distributed

  • H1: The distribution of the Functional Water Points in Osun are not randomly distributed

  • Confidence level : 99%

  • Significance level : 0.01

  • The null hypothesis will be rejected if p-value is smaller than alpha value of 0.01.

G_osun_fuctional.csr <- envelope(wp_functional_osun_ppp.km, Gest, nsim=100)
Generating 100 simulations of CSR  ...
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80,
81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99,  100.

Done.
plot(G_osun_fuctional.csr)

Important

Conclusion: The observed G(r) is far above the G(theo) as well as the envelope - indicating that Functional Water Point in the Osun area are clustered. Hence, we reject the null hypothesis that Functional Water Point in the Osun area are randomly distributed at 99% confident interval.

Computing G_Function Estimate for Non Functional Water Point

G_osun_non = Gest(wp_nonfunctional_osun_ppp.km, correction = "border") 
plot(G_osun_non)

Hypothesis Testing For Non Functional Water Point

Our Hypothesis:

  • H0: The distribution of the Non Functional Water Points in Osun are randomly distributed

  • H1: The distribution of the Non Functional Water Points in Osun are not randomly distributed

  • Confidence level : 99%

  • Significance level : 0.01

  • The null hypothesis will be rejected if p-value is smaller than alpha value of 0.01.

G_osun_nonfuctional.csr <- envelope(wp_nonfunctional_osun_ppp.km, Gest, nsim=100)
Generating 100 simulations of CSR  ...
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80,
81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99,  100.

Done.
plot(G_osun_nonfuctional.csr)

Important

Conclusion: The observed G(r) is far above the G(theo) as well as the envelope - indicating that Non Functional Water Point in the Osun area are clustered. Hence, we reject the null hypothesis that Functional Water Point in the Osun area are randomly distributed at 99% confident interval.

Conclusion for 2nd Order Spatial Point Analysis

Areas with High Functional Water Point and Non Functional Water Point are clustered together in the cities in Osun

Based on our G Function Analysis, since we have determined that both the Functional Water Point and the Non Functional Point are not randomly distributed at 99% Confidence interval in Osun. Hence, we can determine that they are clusted together

Functional and Non Functional Water Point are clustered together in cities of Osun

Based on our G Function Analysis, since we have determined that both the Functional Water Point and the Non Functional Point are not randomly distributed at 99% Confidence interval in 3 main cities Iwo, Osogbo and Ede. Hence, we can determine that they are clusted together

Spatial Correlation Analysis

Know that we have determined that both the Functional Water Point and Non Functional Water Points are not randomly distributed but clustered, we are given a rather interesting question, are they independent of each other. In other words, does the presence of a functional water point affect the presence of another water point.

Therefore we have come up with 2 new Hypothesis for Functional and Non Functional:

Our Hypothesis:

  • H0: The distribution of the Water Points are independent of each other

  • H1: The distribution of the Non Functional Water Points are not independent of each other.

  • Confidence level : 95%

We will be testing this Hypothesis on the Functional and Non Functional Water Points only in the State of Osun.

Computing L Function

In order to test for Correlation we would need to perform a K Function. K function is a popular technique for analyzing spatial correlation in point patterns. It measures the number of events found up to a given distance of any particular event, according to Prof. This makes it perfect for testing for Spatial Correlation Analysis.

However,K Function makes it difficult for us to discern difference between the Theortical K and Predicted K at lower values, as such we will be making use of the L Function instead. L Function is a transformation of K Function before applying a square root transformation, which theoretical stabilised the variance of the estimator.

For us to perform L Function Analysis we will be making use of Lest() Function from spatstat. Find out more about Lest() Function from spatstat here. To further confirm our Hypothesis we would need to perform a Monte Carlo Simulation Test with G Function to test our Hypothesis.

Functional Water Point

Computing L Function Estimate

L_fun_osun = Lest(wp_functional_osun_ppp.km, correction = "Ripley")
plot(L_fun_osun, . -r ~ r, 
     ylab= "L(d)-r", xlab = "d")

Testing the Hypothesis on Functional Water Point

Our Hypothesis:

  • H0: The distribution of the Functional Water Points in Osun are independent of each other

  • H1: The distribution of the Functional Water Points in Osun are not independent of each other

  • Confidence level : 95%

  • Significance level : 0.05

  • The null hypothesis will be rejected if p-value is smaller than alpha value of 0.05.

L_fun_osun.csr <- envelope(wp_functional_osun_ppp.km, Lest, nsim = 39, rank = 1, glocal=TRUE)
plot(L_fun_osun.csr, . - r ~ r, xlab="d", ylab="L(d)-r")

Important

Conclusion: The observed L(r)-r(obs) is far above the L(r)-r(theo) as well as the envelope for almost all distance of d- indicating that Functional Water Point in the Osun are not independent of each other. Hence, we reject the null hypothesis that Functional Water Point in the Osun are independent of each other at all distance of d at 95% confidence interval.

Non Functional Water Point

Computing L Function Estimate

L_non_fun_osun = Lest(wp_nonfunctional_osun_ppp.km, correction = "Ripley")
plot(L_non_fun_osun, . -r ~ r, 
     ylab= "L(d)-r", xlab = "d")

Testing the Hypothesis of Non Functional Water Point

Our Hypothesis:

  • H0: The distribution of the Non Functional Water Points in Osun are independent of each other

  • H1: The distribution of the Non Functional Water Points in Osun are not independent of each other

  • Confidence level : 95%

  • Significance level : 0.05

  • The null hypothesis will be rejected if p-value is smaller than alpha value of 0.05.

L_non_fun_osun.csr <- envelope(wp_nonfunctional_osun_ppp.km, Lest, nsim = 39, rank = 1, glocal=TRUE)
plot(L_non_fun_osun.csr, . - r ~ r, xlab="d", ylab="L(d)-r")

Important

Conclusion: The observed L(r)-r(obs) is far above the L(r)-r(theo) as well as the envelope for almost all distance of d- indicating that Non Functional Water Point in the Osun are not independent of each other. Hence, we reject the null hypothesis that Non Functional Water Point in the Osun are independent of each other at all distance of d at 95% confidence interval.

Conclusion

Functional Water Point are not independent of each other in the State of Osun

Based on the Conclusion of L Function Test, we need to reject the null hypothesis that Functional Water Point in the Osun are independent of each other at all distance of d at 95% confidence interval, therefore we can conclude that Functional Water Point in the Osun area are not independent of each other.

Non Functional Water Point are not independent of each other in the State of Osun

Based on the Conclusion of L Function Test, we need to reject the null hypothesis that Non Functional Water Point in the Osun are independent of each other at all distance of d at 95% confidence interval, therefore we can conclude that Functional Water Point in the Osun area are not independent of each other.

Special Thanks

Special Thanks to Dr Kam Tin Seong for his guidance and help.